A Toolbox for Record Linkage
نویسندگان
چکیده
We developed a record-linkage toolbox in order to compare the performance of various string-similarity measures for German surnames. This ”Matching Tool-Box” (MTB) is made up by independent, highly portable JAVA-programs. MTB is currently used for prototyping pre-processing tools and the empirical comparison of string-similarity measures. Furthermore, MTB has been used successfully in sociological, economical and epidemiological research projects. Zusammenfassung: Um die Verwendbarkeit der verschiedener Ähnlichkeitsmaße für fehlerbehaftete Namen auch für deutsche Namen vergleichen zu können, entwickelten wir eine eine ”Matching Tool-Box” (MTB). MTB besteht aus mehreren, transportablen JAVA-Programmen. MTB dient zur Entwicklung von Pre-processing-Werkzeugen und dem Vergleich von String-Ähnlichkeitsmaßen. MTB wurde erfolgreich in sozialund wirtschaftswissenschaftlichen sowie epidemiologischen Forschungsprojekten eingesetzt.
منابع مشابه
TAILOR: A Record Linkage Tool Box
Data cleaning is a vital process that ensures the quality of data stored in real-world databases. Data cleaning problems are frequently encountered in many research areas, such as knowledge discovery in databases, data warehousing, system integration and e-services. The process of identifying the record pairs that represent the same entity (duplicate records), commonly known as record linkage, ...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملRecord Linkage I: Evaluation of Commercially Available Record Linkage Software for Use in NASS
Record linkage is an important technique in NASS for minimizing the presence of duplicate names on its list sampling frame of farm operators and agribusinesses. In the late 1970' s, NASS developed an automated record linkage system which runs on an IBM mainframe for this purpose. With changes in technology, the need has arisen for portability between platforms, integration with client/server te...
متن کاملA Novel Toolbox for Generating Realistic Biological Cell Geometries for Electromagnetic Microdosimetry
Researchers in bioelectromagnetics often require realistic tissue, cellular and sub-cellular geometry models for their simulations. However, biological shapes are often extremely irregular, while conventional geometrical modeling tools on the market cannot meet the demand for fast and efficient construction of irregular geometries. We have designed a free, user-friendly tool in MATLAB that comb...
متن کاملA Decision Tree Based Record Linkage for Recommendation Systems
Record linkage merges all the records relating to the same entity from multiple datasets, at the entity level. It is the initial data preparation phase for most of the database projects. Traditionally one to one data linkage is performed among the entities of same type with common unique identifier. The proposed one to many and/or many to many record linkage method is able to link the entities ...
متن کامل